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ABSTRACT 

This work examines a semi-blind source separation problem where 
the aim is to separate one source, whose local (nominally periodic) 
structure is partially or approximately known, from another a priori 
unspecified but structured source, given only a single linear combi- 
nation of the two sources. We propose a novel separation technique 
based on local sparse approximations; a key feature of our proce- 
dure is the online learning of dictionaries (using only the data itself) 
which sparsely model the a priori unknown source. We demonstrate 
the performance of our proposed approach via simulation in a styl- 
ized audio source separation problem. 

Index Terms — Semi-blind source separation, sparse represen- 
tations, online dictionary learning 

1. INTRODUCTION 

The blind source separation (BSS) problem entails separating a col- 
lection of signals, each comprised of a superposition of some un- 
known sources, into their constituent components. A canonical ex- 
ample of the BSS task arises in the so-called cocktail party prob- 
lem, and a number of methods have been proposed to address this 
problem. Perhaps the most well-known among these is independent 
component analysis (ICA) 1 1 1, where the sources are assumed to be 
independent non-Gaussian random vectors. Other approaches entail 
more classical matrix factorization techniques like principal compo- 
nent analysis (PC A) Ij2j|4j|, or, when appropriate for the underlying 
model, non-negative matrix factorization (NNMF) |5 1. 

Here we focus on a slightly different, and often more challeng- 
ing setting - the so-called single channel source separation problem 
- where only a single mixture of the source signals is observed. Sin- 
gle channel source separation problems require the use of some ad- 
ditional a priori knowledge about the sources and their structure in 
order to perform separation |6-9|. Here, we assume that the local 
structure of one of the source signals is approximately known (in 
a manner described in more detail below), and our aim is to sep- 
arate this partially known source from an unknown "background" 
source. Our task is motivated by an audio processing application in 
law enforcement scenarios where electroshock devices are used. A 
key forensic task in these scenarios is to determine, from audio data 
recorded by the device itself, the resistive load encountered by the 
device (corresponding to qualitatively "low" and "high" resistance 
loads). The approach proposed here can aims to separate the audio 
corresponding to a nominally periodic and approximately known (up 
to the resistive load ambiguities) discharge from otherwise unknown, 
but often highly structured, background audio. The separated audio 
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signal can subsequently be used to classify the state of the resistive 
load (we consider only the separation task here). 

Our separation approach is based on local sparse approximations 
of the mixture data. A novel feature of our proposed method is in 
our representation of the unknown background source - we describe 
a technique for learning (from the data itself) a model that sparsely 
represents the unknown background source, using tools from the dic- 
tionary learning literature (see, eg., |10f|12| ). The next section de- 
scribes the problem we consider here more formally, and discusses 
the nature of our contributions in the context of existing works in 
sparse representation, dictionary learning, and low-rank modeling. 

2. BACKGROUND AND PROBLEM FORMULATION 

Our effort here is motivated by a single-channel semi-blind audio 
source separation problem, in which the goal is to separate a nom- 
inally periodic and approximately known signal from unknown but 
structured background interference, given only a superposition of the 
two sources. Let x e R" represent our observed data, and suppose 
that X may be decomposed as a sum of two sources - one of which 
(Xp e R") exhibits local structure that is partially or approximately 
known, and the other (Xu e R") is unknown. In our motivating audio 
application for example, x is comprised of samples of an underlying 
continuous time waveform, and we consider Xp to be samples of a 
source that is a nominally regular repetition of one of a small num- 
ber of prototype signals. One example scenario where this model 
is applicable is the case where Xp is, up to some unknown offset 
jitter, periodic. Our aim is to separate the sources Xp and Xu from 
observations of x, which may be noisy or otherwise corrupted. 

Our proposed approach is based on the principle of local sparse 
approximations. In order to state our overall problem in generality, 
we describe an equivalent model for our data x that facilitates the 
local analysis inherent to our approach. Let us suppose that m is an 
integer that divides n evenly, such that n/m = q, an integer. Then 
X e R" may be represented equivalently as a m x g matrix X: 

X = Xp + Xu, (1) 

where Xp is a matrix whose columns are non-overlapping length-m 
segments of Xp, and similarly for X„. The goal of our effort is, in 
essence, to separate X into its constituent matrices Xp and X„. 

As alluded above, our separation approach entails leveraging lo- 
cal structure in each of the components of X. Our main contribution 
comes in the form of a procedure that, given our "partial" informa- 
tion about the columns of Xp, enables us to learn in an online fashion 
and from the data itself a dictionary D such that columns of Xu are 
accurately expressed as linear combinations of (a small number of) 



columns of D. In a broader sense, our work is related to some clas- 
sical approximation approaches as well as several recent works on 
matrix decomposition. We briefly describe these background and re- 
lated efforts here, in an effort to put our main contribution in context. 

2.1. Prior Art 

2.1.1. Low Rank and Robust Low Rank Approximation 

Consider the model l[TJ and suppose that the columns of Xp can each 
be represented as a linear combination of some r linearly indepen- 
dent vectors, implying that Xp is a matrix of rank r. Now, different 
separation techniques may be employed depending on our assump- 
tions of Xu- Perhaps the simplest case is where Xu is random noise 
(e.g., having entries that are iid zero-mean Gaussian); in this case, 
the problem amounts to a denoising problem, which can be solved 
using ideas from low-rank matrix approximation. In particular, it is 
well-known that the approximation Xp obtained via the truncated 
(to rank r) singular value decomposition (SVD) of X is a solution 
of the optimization 

Xp = arg min HX-LHl-, (2) 
L, rk(L)<r 

where rk(L) is the function that returns the rank of L. 

It is well-known that certain (non-Gaussian) forms of interfer- 
ence Xu may cause the accuracy of estimators of the low-rank com- 
ponent obtained via truncated SVD to degrade significantly. This is 
the case, for example, when Xu is comprised of sparse large (in am- 
plitude) impulsive noise. In these cases, the low-rank approximation 
problem can be modified to its robust counterpart, which goes by 
the name of robust PC A in the literature |13||14| . The robust PC A 
approach aims to simultaneously estimate both the low-rank Xp and 
the sparse Xu, by solving the convex optimization 

{Xp,X„} = argmin ||L||t + A||5||i subject to X = L + 5", (3) 

L,S 

where A > is a regularization parameter. Here || L || * denotes the nu- 
clear norm of L, which is the sum of the singular values of L. The 
nuclear norm is a convex relaxation of the non-convex rank function 
rk(L). Further, \\S\\i is the sum of the absolute entries of 5* - es- 
sentially the £i norm of a vectorized version of S, which is a convex 
relaxation of the non-convex £o quasinorm that counts the number 
of nonzeros of S. 

Here, of course, we explicitly assume that Xu is more highly 
structured, making the separation problem more well-suited to a new 
suite of techniques that explicitly exploit such structure. 

2. 1.2. Low Rank Plus Sparse in a Known Dictionary 

A useful extension of the robust PCA approach arises in the case 
where Xu is not itself sparse, but possesses a sparse representation 
in some known dictionary or basis. One example is the case where 
the background source is locally smooth, implying it can be sparsely 
represented using a few low-frequency discrete cosine transform or 
Fourier basis elements. Formally, suppose that for some known ma- 
trix D, we have that Xu = DAu, where the columns of Au are 
sparse. The components of X can be estimated by solving the fol- 
lowing optimization ||15) 

{Xp,Au} = arg min ||L||* + A||A||i subject lo X = L + DA (4) 

L,A 

Note that an estimate Xu of Xu may be obtained directly as Xu = 
DAu. This approach assumes (implicitly) a priori knowledge of a 



dictionary that sparsely represents the background signal, which may 
be a restrictive assumption in practice. 

2.1.3. Morphological Component Analysis 

A more general model arises when Xp is not low-rank, but instead, 
its columns are also sparsely represented in a known dictionary. Sup- 
pose that Xp and Xu are sparsely represented in some known dic- 
tionaries Di and D2, such that Xp = DiAi and Xu = D2A2, 
and that the columns of A\ and A2 are sparse. Such models were 
employed in recent work on Morphological Component Analysis 
(MCA) [16-181, which aimed to separate a signal into its compo- 
nent sources based on structural differences codified in the columns 
of the known dictionaries. The MCA decomposition can be accom- 
plished by solving the following optimization 

{Ai,A2} = ax^ mm \X - DiAi- D2A2fp 

Ai,A2 

subjectto ||Ai||i + ||^2||i < A, (5) 

for some A > 0, where the estimates of Xp and Xu are formed as 
Xp = DiAi and Xu = D2A2, respectively. When Xp and Xu 
are each comprised of a single column, this optimization is equiva- 
lent to the so-called Basis Pursuit (or more specifically. Basis Pur- 
suit Denoising) technique f 19), which formed a foundation of much 
of the recent work in sparse approximation. Note that, as with the 
previously mentioned approach, this approach also assumes a priori 
knowledge of a dictionary that sparsely represents the background. 

2.2. Our Contribution: "Semi-blind" Morphological Compo- 
nent Analysis 

Our focus here is similar to the MCA approach above, but we as- 
sume only one of the dictionaries, say Di, is known. In this case, 
the MCA approach transforms into a semi-blind separation problem 
where we try to also learn a dictionary D2 to represent the unknown 
signal. Our main contribution comes in the form of a "Semi-Blind" 
MCA procedure, designed to solve the following modified form of 
the MCA decomposition 

{^1,^2,752} = arg min \\X - DiAi - D2A2\\% 

Ai,A2,D2 

subjectto ||Ai||i + ||yl2||i < A, (6) 

and this problem forms the basis of the remainder of this paper. 
Specifically, in Section |3] we propose a procedure, based on alter- 
nating minimization, for obtaining local solutions to optimizations 
of the form |6](. In Section |4] we examine the performance of our 
proposed approach in an application motivated by an audio source 
separation problem in audio forensics. Finally, we discuss conclu- 
sions and possible extensions in Section|5] 

3. SEMI-BLIND MCA 

As described above, our model assumes that the data matrix X can 
be expressed as the superposition of two component matrices, Xp 
and Xu. Further, we assume that each of the component matrices 
possesses a sparse representation in some dictionary, such that Xp ki 
DiAi and Xu ~ D2A2, where Di is known a priori. Our essential 
aim, then, is to identify an estimate Ai of the coefficient matrix Ai 
and estimates D2 and A2 of the matrices D2 and A2. Our estimates 
of the separated components are then given by Xp = D\Ai, and 

Xu=D2A2. 



Algorithm 1 Semi-Blind MCA Algorithm 

Input: Original Data X e W"""', Known Dictionary Di e R'"'"', 
Regularization parameters Ai, A2, A3 > 0, 
Number of elements in unknown dictionary £. 

Initialize: Ai <-argmin ||X - Z)ij4i|||, + Ai ||j4i||i 

(or other suitable initialization depending on the problem.) 
Iterate (repeat until convergence): 

repeat 

Dictionary Learning: 

{D2,A2} arg min \\X - DiAi - DiAifp + X2\\A2\\i 

D2,A2 

Coefficient Update: 

[Al AlY = A arg min \\X - DAfp + AgH^Hi 

A 

until convergence 
Output: Learned dictionary ID2 ^ D2, 

Coefficient estimates A\ = A\, A2 = A2. 



We propose an approach to solve ^ that is based on alter- 
nating minimization, and is summarized here as Algorithm [T| Let 
Ai,A2,A3 > be user specified regularization parameters. Our 
initial estimate of coefficients A\, corresponding to the coefficients 
of Xp in the known dictionary Di, is obtained via 

Ii =argmin||X-Diyli||| + Ai||yli||i, (7) 

which is a simple LASSO-type problem. We then proceed in an it- 
erative fashion, as outlined in the following subsections, for a few 
iterations or until some appropriate convergence criteria is satisfied. 
It should be noted that the lack of joint convexity makes the SBMCA 
algorithm sensitive to initialization. Therefore, any suitable initial- 
ization using sparse approximation techniques, depending upon the 
problem setting, can be employed. This is well illustrated in Sec- 
tion|4] where we consider an audio forensics application. 

3.1. Dictionary learning stage 

Given the estimate Ai, we can essentially "subtract" the current esti- 
mate of Xp from X, and apply a dictionary learning step to identify 
estimates of the unknown dictionary D2 and the corresponding co- 
efficients A2. In other words, we solve 

{D2,A2} = arg min \\X - DiAi - + A2||^2||i. (8) 

D2,A2 

Now, given the estimate 752, we update our current estimate of the 
overall dictionary D = [Di D2]. We then update the overall coef- 
ficient matrix by solving another sparse approximation problem, as 
described next. 

3.2. Sparse approximation stage 

Given our current estimate of the overall dictionary, we update the 
corresponding coefficient matrices by solving the following LASSO- 
like problem: 

[IT A^f = 1= arg min \\X - DA\\% + X3\\A\\i. (9) 

A 

Now, we extract the submatrix Ai from A, and repeat the overall 
processing (beginning with the dictionary learning step). These steps 
are iterated until some appropriate convergence criteria is satisfied. 
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Fig. 1: A segment of mixture components (noise free): (a) the nom- 
inally periodic signal Xp (each segment is the discharge correspond- 
ing to one of the two resistive load states, randomly selected); (b) the 
background signal (c) the mixture x. 



4. EVALUATION: AN APPLICATION IN AUDIO 
FORENSICS 

We demonstrate the performance of our approach on a stylized ver- 
sion of the audio separation task described in the introduction, which 
is motivated by forensic examination of audio obtained during law 
enforcement events where electroshock devices are utilized. For the 
sake of this example, we suppose that the electroshock devices dis- 
charge approximately 36 times per second, and the waveforms gen- 
erated by the device during discharge take one of two different forms 
depending on the level of resistive load encountered by the device. 
The collected audio corresponds to the nominally periodic discharge 
of the device, superimposed with background noise (eg., speech). 
Our aim is to separate this superposition into its components. 

Figure [T] shows a segment of the signals used in the simulation. 
We simulate the form of the approximately periodic signals (xp), 
shown in Figure [T| (a), using two distinct exponentially decaying si- 
nusoids, to emulate a series RLC circuits with different parameters, 
to model the loaded and open circuit states. Specifically, we gener- 
ate two distinct waveforms, which correspond to the two states (high 
and low resistive load), and form the overall signal Xp by concate- 
nating randomly-selected versions of these prototype signals, each of 
which is subject to a few samples of timing offset in order to model 
the non-idealities of the actual electroshock device. A speech sig- 
naj^shown in Figure[I](b), was used to model background noise that 
may be present during the altercation. We simulate the overall raw 
audio data as a linear combination of Xp, Xu and zero-mean random 
Gaussian noise Af{0, o^) (Figure[T](c) depicts the ideal case a = 0). 

The data matrix X is formed from the signal x as discussed in 
Section |2] using non-overlapping segments with 400 samples each, 
and we form the dictionary Di by incorporating certain circular 
shifts of the nominal prototype pulses from which Xp was gener- 
ated. We then employ the semi-blind MCA approach (discussed in 
Section|3]l to separate the background audio from the approximately 
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Fig. 2: Histogram of normalized error-per-block measured using the vector /2-norm of extracted nominally periodic signal Xp and extracted 
speech signal Xu for Semi-blind MCA, and MCA-DCT, and MCA-Identity, for the audio forensic application. 



known periodic portion. 

We compare the performance of our approach with two versions 
of MCA, one using the DCT basis and the other using the identity 
basis to form the dictionary D2. We use the estimated Xp, obtained 
via MCA-DCT procedure to initialize our approach, as follow: we 
apply one step of orthogonal matching pursuit (OMP) |20| on the 
estimate of Xp obtained via MCA-DCT to form the initial (one com- 
ponent per column) estimate Ai for the SBMCA algorithm. 

Table[T]lists the best achievable reconstruction SNRs (in dB) of 
each method. We note that our interest here is in comparing the best 
performances achieved by MCA and our proposed method, so we 
clairvoyantly tune the value(s) of the regularization parameter to give 
the lowest error for each task. (In general, a different regularization 
parameter may have been utilized to obtain the reconstruction SNRs 
of each signal component, even for the same method and same noise 
level - in other words, the SNRs listed may not be jointly achievable 
from a single implementation of any of the stated procedures). 

A second, perhaps more interesting, performance comparison is 
shown Figure |2] which depicts the histogram of normalized errors- 
per-block, measured using the vector Z2-norm, for each methocQ 
We observe from the distribution of ^2 -errors across blocks, that 



' Panels (a), (e) and (i) represent the histogram of normalized eiTor-per- 
block for Xp and (b), (f) and (j) represent the histogram of normalized error- 
per-block for Xu via SBMCA, MCA-DCT and MCA-Identity respectively, 
with standard deviation of gaussian noise (7 = 0. Panels (c), (g) and (k) rep- 
resent the histogram of normalized en'or-per-block for Xp and (d), (h) and (1) 
represent the histogram of normalized error-per-block for Xu via SBMCA, 
MCA-DCT and MCA-Identity respectively, with standard deviation of gaus- 
sian noise c = 0.1. 



Table 1: Comparative analysis of Reconstruction SNR(in dB). 
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the SBMCA procedure (Figure [2] (a-d)) results in larger number of 
blocks with lower errors as compared to the MCA-DCT (Figure [2] 
(e-h)) and MCA-Identity (Figure [2] (i-1)). This feature is of primary 
importance in the audio forensics application where classifying each 
period of the nominally periodic signal Xp, as one of the two proto- 
type signals, is of interest. 

5. CONCLUSION 

We proposed a semi-blind source separation technique based on 
local sparse approximations. Our approach exploits partial prior 
knowledge of one of the sources, in the form of a dictionary which 
sparsely represents local segments of one of the sources. A key fea- 
ture of our approach is the online learning of a dictionary (from the 
mixed source data itself) for representing the unknown background 
source. We posed the problem as an optimization task, proposed a 
solution approach based on alternating minimization, and verified 
its effectiveness via simulation in a stylized audio forensics applica- 
tion. Possible extensions to other applications (eg., image and video 
processing) are left to future efforts. 
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